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Abstract —Green data centers have become more and more 
popnlar recently due to their sustainability. The resource man¬ 
agement module within a green data center, which is in charge 
of dispatching jobs and scheduling energy, becomes especially 
critical as it directly affects a center’s profit and sustainability. 
The thrust of managing a green data center’s machine and energy 
resources lies at the uncertainty of incoming job requests and 
future showing-up green energy supplies. Thus, the decision of 
scheduling resources has to be made in an online manner. Some 
heuristic deterministic online algorithms have been proposed in 
recent literature. In this paper, we consider online algorithms 
for green data centers and introduce a randomized solution with 
the objective of maximizing net profit. Competitive analysis is 
employed to measure online algorithms’ theoretical performance. 
Our algorithm is theoretical-sound and it outperforms the pre¬ 
viously known deterministic algorithms in many settings using 
real traces. To complement our study, optimal offline algorithms 
are also designed. 

I. Introduction 

In this paper, we study the problem of scheduling jobs 
and energy in data centers. A data center is a computing 
facility used to house computing systems and their associated 
components such as communication and storage subsystems. 
Usually, a data center stores data and provides computing 
functionalities to its customers. Through charging fees for data 
access and server services, a data center gains revenue lH]. At 
the same time, to maintain its running structure, a data center 
has to pay operational costs including hardware costs (for ex¬ 
ample, those of upgrading computing and storage devices and 
air conditioning facilities), electrical bills for power supply, 
network connection costs, and personnel costs. To maximize 
a data center’s net profit, we expect to increase the revenue 
gathered and simultaneously decrease the operational costs 
paid. 

Unfortunately, the ever increasing power costs and energy 
consumption in data centers have brought many serious eco¬ 
nomic and environmental problems to our society and evoked 
significant attention recently. As reported, the estimates of 
annual power costs for U.S. data centers in 2010 reached 
as high as 3.3 billion dollars m. As a concrete example, 
in a modern high-scale data center with 45,000 to 50,000 
servers, more than 70% of its operational cost (around half a 
billion dollars per year) 0 goes to maintaining the servers 
and providing power supply. The energy spending in data 
centers in 2014 is $143 billion and is at a growth rate of nearly 


7% a . Targeting on both economic and environmental factors, 
academic researchers and industrial policy makers have put a 
lot of effort in investigating engineering solutions to make 
data centers work better without sacrificing service qualities 
and environment sustainability. 

A growing trend of reducing energy costs as well as 
protecting our clean environments is to fuel a data center using 
renewable energy from wind and solar power. We term this 
type of energy as “green energy” as it comes from renewable 
and non-polluting sources. The amount and availability of 
green energy are usually intermittent and cannot be fully 
predicted in the long term. Another type of energy, called 
“brown energy”, comes from the available electrical grid in 
which the power is produced by carbon-intensive means. 
Brown energy’s sources are much more stable and predictable. 
A data center with both green and brown energy supplies is 
called a green data center. A natural goal of managing energy 
resources is to reduce the usage of brown energy if possible, 
while to maintain the levels of service quality to jobs. 

In this paper, we design job and energy scheduling al¬ 
gorithms for green data centers. The ultimate goal is to 
optimize green and brown energy usage without sacrificing 
service qualities. Our research is built upon the work done by 
Goiri et. al 0, by Keskinocak et. al 0, and by Bansal et. 
al Q. Within this framework, job requests arrive at a data 
center over time. An algorithm is to determine whether (job 
admission), when (job processing-window) and where (job- 
machine matching) to schedule a job, as well as which type 
of energy to use in a time slot. Note that different ways 
of assigning jobs to machines and different time slots, and 
feeding machines using different types of energy may result 
in different revenue and operational cost. Define net profit 
as the difference between revenue and operational cost. We 
address the following question: How do we dispatch jobs and 
schedule green/brown energy to maximize net profit? Recall 
that the information on later released jobs and future accessible 
green energy is in general unknown at the moment when the 
current scheduling decision is made, and thus, what we study 
in this paper can be regarded as an online version of a machine 
scheduling problem. 

To evaluate an online algorithm’s performance, we start 
from two perspectives. In theory, we use competitive ratio 11 
to measure an online algorithm’s worst-case performance 


against a clairvoyant adversary. Competitive analysis has been 
used widely to analyze online algorithms in computer science 
and operations research fS). In practice, we conduct simu¬ 
lations using both real traces and simulated data. The crux 
of our algorithmic idea in this paper is to introduce ‘internal 
randomness’ in scheduling energy and jobs. As what we will 
see in the remaining parts of this paper, ‘randomness’ helps 
both theoretically and empirically, particularly in adversarial 
settings. 

A. Problem formulation 

A data center is regarded as a resource provider which 
provides a set of sharable machines for its clients. The clients, 
regarded as resource consumers, have their jobs processed 
and in turn, pay for the service they get. The data center’s 
revenue management module has the objective of maximizing 
its net profit, defined as the difference between the revenue 
collected from the clients and the operational costs charged 
to maintain the computing and networking system. Here the 
operational costs do not include those for upgrading systems, 
paying personnel, or training operators. We model a data 
center’s revenue management as a decision-making problem of 
scheduling jobs and energy. The components of a computing 
system within a data center is pictured in Figure [T] and we will 
introduce each of them in details as shown below. 


TABLE II 

Time cost of physical server state transition 



State 

power consumption (in Watts) 

BUSY 

240 

IDLE 

150 

SLEEP 

10 

OFF 

0 


from state 

to IDLE state (in seconds) 

SLEEP (Hibernate/Suspend) 

25 

OFF 

48 

IDLE 

0 


Fig. 1. Components of a solar-powered green data center 

We model a data center handling large batch-jobs, as what 
Facebook processes ID, M- This model is the same as the 
one proposed in in. Some statistical data about machine 
settings iia are shown in Table [I] and Table Data cen¬ 
ters usually have job processing time in the order of tens 
of minutes ca. In the two real workloads traces in na, 
Ea, jobs have average processing time of 0.86 and 1.44 
hours respectively, with medium being 12.01 and 1.002 hours 
respectively. 

TABLE I 

Power consumption of physical server state 


Machine resources: Time is discrete. A data center hosts 
M G machines (also called nodes) to schedule jobs. At any 
time, a machine can process at most one job. To make these 
machines function, electrical power resource is consumed at 
the time when jobs are being executed. We normalize the 
energy costs such that without loss of generality, we assume 
that a machine consumes 1 unit of energy per time slot when 
it is processing a job and 0 unit otherwise. This simplification 
is supported by the negligible machine transition time cost, 
compared with batch-job sizes. 

Job requests: Clients release jobs to a data center to be 
processed. Jobs arrive over time. At a time, some (may be 
0) jobs arrive. Each job j has an integer arriving time (also 
called release time) rj G Z"*", an integer processing time pj G 
Z+, an integer deadline dj G Z+, and an integer machine 
requirement qj G [1, M], Running one job j may require more 
than one machines to be simultaneously running at a time. The 
total machine resource requirement for a job j is defined as 
qj X Pj. The resource management module specifies whether 
and where to schedule a job upon its arrival. A successfully 
completed job j needs to be executed in a consecutive time 
period without being interrupted, preempted, or migrated 0, 
starting at a time in-between Vj and dj — pj. We can use a 
triple {rj,pj,qj) to denote a job j. 

Time-sensitive revenue: A client pays to the data center 
for the service he receives. The payoff depends on the job’s 
machine resource requirement as well as the service quality. 
Consider a job j = {rj,pj,qj). Let Sj denote the starting time 
to execute j and Cj (cj := Sj -\- pj) denote j’s completion 
time im. For continuous job streams, we revise the term 
stretch 113 , M to characterize the service quality Ij that a 
job j receives. (Recall that in cloud computing and data center 
services, a client assumes that he gets to be served immediately 
upon delivering his job request.) Define L := —, where 

^3 ‘^3 

Cj > fj -f Pj. Each client pays to the data center money 
(revenue) proportional to its machine resource consumed: 

{ $/3 X Pj X qj , if Ij > Lj 
$0, otherwise 

service charging rate m and Lj (Lj G (0,1]) is the least 
service quality that a client j can receive.) As what is specified 
by Amazon EC2 data center service m, different clients pay 
various rates of service fee for per unit of different types of 
jobs to be processed. To ensure that we receive money Vj from 
a job j, we need to guarantee — > L, . Thus, we define d, 

Cj Vj J J 

as the deadline of completing a job j, where dj := rj -f ffi, 
to indicate the time by which the job j should be completed 
to satisfy its quality requirement. 


. The parameter /3 is called 





















Time-sensitive energy costs: Energy is consumed along 
the course of machines executing jobs. Usually, a data center 
is able to predict green energy quantity only within a 48- 
hour scheduling window (see la and the references therein). 
Different types of energy costs vary over time. Spending 
green energy costs us nothing. Unfortunately, no batteries are 
used to store any surplus green energy ||T9l . due to economic 
concerns and technical difficulties. The brown energy’s unit- 
cost is time-sensitive and thus it is a variable related to on- 
peak/off-peak time periods. A unit of brown energy has price 
$B‘^ when at on-peak (usually at daytime) and price 
when at off-peak (usually at nighttime). This assumption is 
the most common one used in modeling brown electricity 
pricing El. For instance, the prices charged by an integrated 
generation and energy service company in New Jersey Q have 
B'^ = $0.13/kWh and = $0M/kWh. 

Objective: Scheduling jobs successfully can earn a data 
center some revenue and paying for any brown energy used 
(to power a data center, along with the limited green energy) 
incurs operational costs. We define 

net profit = revenue - operational cost, 

where revenue is the total money gained through finishing jobs 
and operational cost is the total brown energy cost that the 
service provider consumes to run the jobs. The objective of 
revenue management module of green data centers is to design 
a scheduler to complete all or a subset of the released jobs in 
order to maximize net profit. We call this problem GDC-RM, 
standing for ‘Green Data Center’s Revenue Management’. In 
the remaining parts of this paper, we present combinatorial 
optimization algorithms for GDC-RM. Recall that job requests 
and energy arriving information are unknown beforehand, 
GDC-RM is essentially an online decision-making problem. 

B. Related work 

How to schedule green energy in an efficient and ef¬ 
fective manner has been investigated extensively. Although 
green energy has the advantages of being cost-effective and 
environmental-friendly, there is a challenge in using it due to 
its daily seasonal variability. Another challenge comes from 
customers’ workload fluctuations EOll . There could lead to 
a temporal mismatch between the green energy supply and 
the workload’s energy demand in the time axis — a heavy 
workload arrives when the green energy supply is low. One 
solution is to “bank” green energy in batteries for later possible 
use. However, this approach incurs huge energy lost and high 
additional maintenance cost IISI. Thus, a run-time online 
algorithm for a matching of workload and energy is highly 
demanded for green data centers. 

Two green data center settings have been considered; (1) 
centralized data centers (such as in El, lED, EH) and (2) 
geographically distributed data centers (such as in Il24ll '). 
The objectives to optimize are usually classified as (a) to 
maximize green energy consumption, (b) to minimize brown 
energy cost, and (c) to maximize profits. In addition, some 
researchers incorporated dynamic pricing of brown energy 


in their models El, ca, ci. Unlike the model studied in 
this paper, research on geographical data centers focuses on 
distributing workloads among distributed data centers in order 
to consume the available free green energy or relative cheaper 
brown energy at other data centers. Although geographical 
data centers have become popular nowadays for big companies 
such as Google and Amazon, small-scale centralized data 
centers are still important since as reported, numerous small 
and medium-sized companies are the main contributors to the 
energy consumed by data centers EH. There exists a huge 
impact in studying the problem of revenue management for 
centralized data centers. 

Among the work on centralized data centers, El, ED 
studied a model which is the same as ours presented in this 
paper. Il28ll aimed to improve green energy usage and 
had the goal of reducing brown energy costs. The algorithmic 
idea underlying the above-mentioned solutions is greedy and 
they employed algorithms known as First-Fit and Best-Fit. 
All prior work focuses on either maximizing green energy 
consumption or minimizing brown energy consumption/cost 
except EOll which studied the net profit maximization problem 
for centralized data center service providers. EOl proposed 
a systematic approach to maximize green data center’s profit 
with a stochastic assumption over the workload — the work¬ 
load that they studied is restricted to online service requests 
with variable arrival rates. In this paper, we study the profit 
maximization problem in a more general setting. In particular, 
we do not make any particular assumptions over the work¬ 
load’s stochastic property. In addition, we incorporate dynamic 
brown energy price in our model which is a widely used energy 
charging scheme in data centers. 

II. Online Algorithms 

The offline version of GDC-RM is NP-hard which can be 
proved via a reduction to the well-known NP-hard Knap¬ 
sack problem ETI as shown in Appendix In reality, job 
scheduling in data centers is essentially an online problem. 
For the online version of the problem GDC-RM, we first 
discuss two widely-used heuristic online algorithms First-Fit 
and Best-Fit and analyze their limitations. Then we propose 
a randomized algorithm Random-Fit. We conduct competitive 
analysis when we evaluate an online algorithm’s theoretical 
performance. Competitive analysis is used to compare the 
output of an online algorithm with that of an optimal offline 
clairvoyant algorithm. This unrealistic offline algorithm is 
assumed to know all the input information (including the green 
energy aiTivals and units, brown energy prices, and job arriving 
sequences) beforehand. 

Definition 1 (Competitive ratio ISl). A deterministic (re¬ 
spectively, randomized) online algorithm ON is called k- 
competitive if its (respectively, expected) performance of any 
instance is at least \/k times of that of an optimal offline 
algorithm. The optimal offline algorithm is also called (re¬ 
spectively, oblivious) adversary. Let OPT denote the optimal 
offline solution of an input. Competitive ratio k is defined as 


k := max where 6 is a constant and E[ON'\ is ON’s 

(expected) output of an input. 

Note that unlike stochastic algorithms which heavily rely on 
the statistical assumptions on the input sequence, competitive 
online algorithms guarantee the worst-case performance in any 
given finite time frame against its adversary. The workload 
(input) does not need to satisfy any stochastic assumptions. 
Competitive analysis is used when rigorous analysis of online 
algorithms is needed and when the input’s stochastic properties 
are hard to get. For the problem GDC-RM, a green data cen¬ 
ter’s workloads are difficult to model ll^ and thus competitive 
analysis acts as a suitable metric. 

A. Competitive analysis of First-Fit, Best-Fit, and GreenSlot 

First-Fit is a conventional deterministic online scheduler 
which schedules, if possible, a job to the earliest available time 
slots regardless of its energy cost. Although this approach can 
cause minimum delay of a job and maximum throughput, it 
might not achieve a good overall profit due to high brown 
energy cost needed to finish a job in earlier time slots (instead 
of using green energy or less-expensive night-time brown 
energy in later time slots). 

Best-Fit is also a widely-used heuristic and deterministic 
online algorithm. Actually, First-Fit and Best-Fit have been 
used extensively in online ID bin-packing ll^ and 2D bin¬ 
packing problems 041 . The Best-Fit algorithm locates the 
most ‘cost-efficient’ time slots to schedule a job. It picks up 
the best time interval to schedule a job in a myopic way 
and it does not take later job arrivals or energy supplies into 
account. As pointed out in Q, Best-Fit may reject more jobs 
or miss more deadlines than First-Fit does. The reason lies 
at the observation that Best-Fit always delays jobs to the 
best cost-efficient time slots regardless of future workload 
for those time slots. As a result, some jobs may fail to be 
scheduled due to deadline constraints and thus the profit is 
harmed. GreenSlot a is a variant of Best-Fit; a heuristic 
modification is made to avoid rejecting future-arriving jobs due 
to delaying scheduling current jobs. GreenSlot adds a penalty 
in postponing scheduling jobs at time slots that are likely to 
cause a job to miss its deadline. This penalty is a manly-tuned- 
up parameter to fit various job sets and thus it is workload 
dependent and cannot guarantee to improve the worst case 
profits nor to be used as a universal algorithm handling various 
job requests. 

We analyze the competitive ratio of First-Fit and Best-Fit in 
the following. First we introduce some notations appearing in 
Theorem [T] |^and In specific, we normalize the costs of 
green energy and brown energy. According to the definition 
of profit, a job j with pj processing time and qj machine 
requirement has profit c-pj ■ qj — P{t), where P{t) has the 

value 0 (for green energy), B‘^ (for on-peak brown energy), 
or (for off-peak brown energy) respectively when the job 
is processed using various types of energy. P{t) is in integral 
along the time when the machines process j. If all jobs are 
with the same processing time and machine requirements, then 


we normalize the profit as ° = l — f jn 

our proofs below, we generate instances such that for each job, 
it is processed by only one type of energy using the particular 
algorithm. Thus, for ease to present the competitive ratio, we 
define 1 - as Von, Voff, Vg as below. 

P{t) . 

C-pj- qj ■ 

Vom if only use on-peak brown energy to schedule j 
Vof f, if only use off-peak brown energy to schedule j 
Vg, if only use green energy to schedule j 

Note that the normalized profit 1—has a value among 

V c-pj-qj 

(0,1]. According to the fact that on-peak brown energy is 
expensive than off-peak brown energy. Also, green energy has 
cost 0. We have 0 < Von < Voff < Ug = 1. Also, for jobs with 
the same processing times and same machine requirements, 
they have the same value for Von, Voff, and Vg. 

Theorem 1. The lower bound of competitive ratio for First-Fit 
is max | ^ I. 

Proof: To prove the lower bound, we create an input 
instance. Let OPT denote an optimal offline algorithm. We 
assume each job has processing time requirement pj = 1 and 
machine requirement qj — M. We use (r, d) to denote a job 
with release time r and deadline d. We have M machines. 

Assume there are two daytime time slots ti and t2, with 
0 and M green energy units arriving at them respectively. 
Assume there is only one job j = (^ 1 ,^ 2 ) arriving. First-Fit 
schedules j at time ti, earning a revenue Von- OPT schedules 
j at time t2, achieving a profit Vg. The competitive ratio is 

OPT _ ^ 

FF Von ' 

If ti is at on-peak and t2 is at off-peak, then we assume 
that no green energy arrives at both time slots. Using the same 
analysis approach, we get the competitive ratio 
Therefore, we conclude that First-Fit has a competitive ratio 

at least max | , — ]•. ■ 

I '^OTi ’ '^on J 

Theorem 2. The lower bound of competitive ratio for Best-Fit 

is max 11 -f , 1 -f I. 

I ''o//’ ■Vg J 

Proof: We prove via constructing an input instance as a 
lower bound example. We assume all the arriving jobs have 
processing time pj = 1 and machine requirement M (no two 
jobs can be executed simultaneously at the same time slot). 

Assume there are two time slots U and ^2 — G is at on- 
peak while t2 is at off-peak. There are no green energy arriving 
at both time slots. Assume there are two jobs released ji = 
{ti,t 2 ) and j 2 = {t 2 ,t 2 ). 

Best-Fit will delay job ji to be scheduled at time < 2 , 
resulting in a deadline conflict between jobs ji and j2, and 
thus only gain profit Uo//. OPT will schedule ji and j 2 at 
time ti and ^2 respectively, gaining a profit Von + Voff- Thus 
the competitive ratio is ^jfp = 1 -f 











If ti is at off-peak and t 2 is at on-peak, then we assume 
there are 0 and M units of green energy arrive at time ti and 
t 2 respectively. Using the same analysis approach, we get the 
competitive ratio = 1 -f we conclude that Best-Fit 

has a competitive ratio at least max 11 + ^ 7 ^; 1 + | • ■ 

Based on the above analysis and recall 0 < Von < Voff < 
Vg = 1 , we have the following result. 

Corollary 1. Deterministic algorithms First-Fit and Best-Fit, 
with or without job preemption, have competitive ratios no 
strictly better than 2, even for a restricted case in which all 
jobs are with the same length. 

As shown above, First-Fit and Best-Fit have arbitrary worse 
competitive ratios. Even for the special case in which all 
jobs are with the same processing times and the same node 
requirements, First-Fit and Best-Fit have competitive ratios no 
better than 2. The crux of competitive analysis lies as below: 
On one hand, if we schedule a job regardless of its alone 
energy cost, then we favor the algorithm First-Fit as it leaves 
room for later arriving jobs to be scheduled. On the other hand, 
if we schedule a job considering its energy cost, then this job 
may be scheduled at a later time when energy is cheaper (e.g., 
the green energy runs out for now and there exists predicted 
green energy in the future). The potential risk is that Best-Fit 
may prevent admitting and completing a later released job. 
Such a deterministic online algorithm is then pessimistic in 
terms of competitive ratio. 

B. Randomized algorithm Random-Fit 

In order to solve this dilemma introduced by First-Fit and 
Best-Fit in maximizing net profit, we introduce an algorithm 
with internal randomness to twist the high brown energy cost 
that we pay right now and the high cost of losing potential 
future jobs. We remark here that the randomness (i.e., the 
probability p in the following Algorithm [^1 does not depend 
on the job workload at all and thus, we do not have to derive 
p from any stochastic features assumed from the input. We 
develop an algorithm called Random-Fit (as in Algorithm [T]) 
with parameter p and we will show how to set the value of p 
to lead to the optimal result. 


Algorithm 1 Random-Fit (RF) 

1 : Let j denote an arriving job. 

2: if there is sufficient free green energy to schedule j then 
3: schedule j at its earliest time interval; 

4: else 

5: use probability p to schedule j at its earliest time 

interval; 

6 : use probability 1 — p to schedule j at its most economic 

time interval. 

7: end if 


In the following, we consider a special case in which all jobs 
are with the same lengths and the same node requirements. We 
calculate the optimal value for p in the following analysis. 


Theorem 

processing 

algorithm 

c = 


3. In scheduling jobs with the same 
times and the same node requirements, 
Random-Fit has its competitive ratio 


max 




against an oblivious adversary. This competitive ratio c is no 
more than 1.25. 


Proof: Let OPT denote an optimal offline algorithm (an 
oblivious adversary) as well as its net profit. Let RL denote 
the Random-Lit algorithm as well as its expected net profit. In 
order to prove this theorem, we will show that W < 1-25. 

We employ a charging scheme to prove Theorem]^ Initially, 
OPT and RL have the same energy resource and machine 
resource. We consider an arriving job at time ti with the 
inductive assumption that before time U, the ratio between 
OPT and RL is no more than c (in Theorem]^ c = 1.25). 
In the following, we show that after time ti, the inductive 
assumption still holds. 

Two facts are used in the proof: (1) Randomness only plays 
its role when no green energy is available (otherwise, no 
random decision is needed, see Algorithm [^i; and (2) If OPT 
schedules a job at time t, then OPT schedules the earliest- 
deadline job as all jobs are with the same processing times and 
node requirements. We will show that the following invariant 
holds: At any time, the net profit ratio between OPT and RF 
is no more than c; also, OPT has no more remaining green 
energy than RF does, if we charge appropriate revenue to OPT. 
This includes the scenario in which OPT schedules a job later 
with energy consumption while we charge the revenue and 
the energy cost for now for OPT. Once this invariant holds. 
Theorem holds immediately. We consider the release jobs 
via case study and use (r, d) to denote a job with release time 
r and deadline d. 

a) Consider the two neighboring time slots ti and t 2 
which are at on-peak and at off-peak respectively: 


1) OPT releases one job ji = {ti,t 2 ) and OPT schedules 

ji at time t 2 , achieving a profit Voff- While RF will 
schedule job ji to time ti with probability p and to 
time slot t 2 with probability 1— p, earning an expected 
profit p ■ Von + (1 — p) • Vof f ■ In this case, the competitive 
ratio is \ -• 

RF p-Von-\-{^ — p)-Vof f 

2) OPT releases two jobs ji = and j 2 = (^ 2 ,^ 2 )- 

OPT would schedule ji at time ti and schedule j 2 at 
time t 2 , achieving a profit of Von + 't’off- While, the 
RF will schedule ji at time U with probability p and 
schedule either ji or j 2 at time ^2 (due to the job 
deadline constraints), earning a profit of p ■ Von + Voff- 
Therefore, the competitive ratio is 


OPT _ Von+Voff 


In this 


{max{- 


scenario, 

// 


the 


solving above 
where x = 


RF 

competitive 

j j 


P-Var^-i-Vaff ' 

ratio is: 
In 


OPT 

RF 


„ + {l—p)-Vaff ’ P-Von+Voff 

min-max problem, p = 

optimizes the competitive ratio 
2 


'’off 


= l-|-a; — a:"^ = l 


off 


[jw y 

V"°// J 


< 1.25. 
















b) Consider the two neighboring time slots ti and t 2 
which are at off-peak and at on-peak respectively: 


1) OPT releases one job ji = (^ 1 ,^ 2 )- The worst-case is 

that OPT uses the on-peak day’s free green energy to 
schedule this job ji. Using the same analysis approach, 
we get a competitive ratio ^ . 

2) OPT releases two jobs ji = {ti,t 2 ) and j 2 = (^ 2 ,^ 2 )- 


Similarly, we get a competitive ratio 


OPT _ Voff+Vg 


RF 


P-Vajf+Vg 


In 

this 

scenario, 

the comi 

letitive ratio is 

miiip * 

l^max j 

r Vg 

■Vg+Vaff 1 

>|. Similarly, when 

[P-V0ff+{1-P)-Vg’ p-Voff-\-Vg J 


P = 1 where y = (Q < y < 1), we get the optimal 


t+y-y 

competitive ratio 
1.25. 


OPT 

RF 




the adversary has only one unit of resource, and ALG’s 
performance is no worse than 1/c times of what ADV 
achieves. Evidence has shown that if a is small while c 
decreases significantly, then the algorithm AEG has much 
better practical performance than what its theoretical bounds 
show. In the problem GDC-RM, green energy is a kind of 
scarce resource. We have the following negative result on the 
resource augmentation approach. 

Theorem 4. The lower bound of competitive ratio for a-times 
green energy augmentation is no better than max{ , 1 -p 

Voff ’ Vg J 

The formulation of the optimal offline algorithm is shown 
in Appendix 


Corollary 2. Random-Fit has a better competitive ratio com¬ 
pared to First-Fit and Best-Fit. 


III. Performance Evaluation 


Proof: In Theorem and Theorem the lower bounds 
of competitive ratio for Eirst-Eit and Best-Eit are proved to be 
max I I and max 11 + , 1 + | respectively. 

We use EE and BE to stand for Eirst-Eit and Best-Eit. As the 
expected competitive ratio of Random-Eit is no larger than 


max 

OPT 

RF 


1 + 


off 


te)' 


I 1 + 


^off 




< 


OPT 

BF 


we have 


. Since l-\-k — < 1/fc (where 0 < fc < 1), we 


havel + ^-f^) < and 1-P f 

■Ooff \VaffJ '"g \ ■Vg ) 

then we have 


< 


ff RF ^ ff 

Corollary 3. The optimal randomness (probability) for 

p = , I ^2 ' X = 

P' = 


Random-Fit is 


l+x—x"^ ’ 

y 


V ii scheduling jobs 

y = 


X i+y-y^ ’ 

from on-peak time to off-peak time and from off-peak time to 
on-peak time respectively. 


We remark here that using Yao’s principle 1351 . m, the 
lower bound of online randomized algorithms on the general 
case can be easily derived from our competitive analysis on 
deterministic algorithms. 


C. On offline algorithms and resource augmentation approach 
for analyzing online algorithms 

An optimal offline algorithm for the cases in which jobs 
have same processing times and node requirements can be 
formulated using a linear program. 

Consider an online algorithm and its competitive analysis, 
as what we have done in Section |II-B| Completive ratio is 
commonly used to measure an online algorithm’s performance 
degradation when the future input information is totally un¬ 
known. A theoretical measure called resource augmentation 
was introduced, especially for analyzing online algorithms 
with poor worst-case performance ESi, Ell, ES). The un¬ 
derlying idea is to increase the scarce resource to compensate 
the loss due to limited knowledge about future. 

An online algorithm AEG is called a-resource c-competitive 
if the online algorithm is given a times of resources while 


In this section, we experimentally evaluate our designed 
randomized algorithm against GreenSlot El scheduler as well 
as the two well-used deterministic online algorithms Eirst-Eit 
and Best-Eit which have been adopted with possible tuning- 
ups in previous literature. An optimal offline algorithm is also 
developed, although its running time is tedious when the input 
size is large. The optimal offline algorithm is formulated as a 
binary integer program. We will first introduce the simulation 
setting, and then explain the simulation methodology, and 
hnally report the simulation results and analysis. 

A. Simulation settings 

Data center: The simulated green data center is con- 
hgured similar to the one in El but with more machines 
(nodes). The data center is a cluster consisting of 100 machines 
with each machine consumes 140W when they are running 
jobs. The total energy consumption is the sum of the energy 
consumed by the machines when they are processing jobs over 
time. 

Green energy: We use the solar energy trace from the 
Computer Science Weather Station at University of Mas¬ 
sachusetts, Amherst EU- The solar energy is hne grained such 
that it is collected every 5 minutes. We scale down the solar 
energy trace and make it compatible with the simulated data 
center by making the peak solar power cover the maximum 
possible power consumption. We select an arbitrary 5-day- 
period time to simulate the solar energy trace input. 

Brown energy price: The brown energy price is varying 
at on-peak/off-peak periods. The electricity cost is less at off- 
peak and more at on-peak periods. We use the prices charged 
by PSEG in New Jersey at summer time El as an example: 
on-peak price (from 9am to 11pm) 0.13/kWh, off-peak price 
(from 11pm to 9 am) 0.08/kWh. 

Service pricing: The green data center service provider 
charges the clients for the computing resource consumed. We 
set the service price based on Amazon EC2’ pricing Q. The 
charging price is set as $0.022/h per machine. 


















Workloads: We use real workload traces GridSk as the 
workload input in our simulation. GridSk HI is a real 
workload trace which was collected from Grid’5000 sys¬ 
tem ioi, a-2218 node experimental grid platform consisting 
of 9 sites geographically distributed in France, from May 2004 
to November 2006. 

We randomly select 2 five-day-period workloads, denoted as 
Grid5k-1 and Grid5k-2 as the workload input in the simulation. 
Note that in order to simulate various workload utilization, we 
random sample jobs to create simulation workloads. Also, the 
job processing time and node requirements are re-scaled to 
meet the size of the simulated data center. 

B. Methodology 

We evaluate the performance of the online algorithm under 
various types of workloads and various value of least service 
quality L. We set the workload utilization range from 10% 
to 150%. Note that as Random-Fit has its randomness factor 
internal to the algorithm, we do not need to tune its random¬ 
ness. Each simulation is repeated for 30 times and we compare 
the average values. To evaluate the algorithms, we conduct 
large scale simulations (100 machine nodes) to thoroughly 
compare the performance of the online algorithms. Due to 
the high running time demand of the offline algorithm, we 
simulate with relative smaller scale parameters (16 machine 
nodes) when compare the online algorithms with the optimal 
offline algorithm. 

C. Result and analysis 

We first present the evaluation of the online algorithms 
Random-Fit, First-Fit, Best-Fit and GreenSlot under various 
settings. Then we show the comparison of the online algo¬ 
rithms with the optimal offline algorithm to confirm with the 
theoretical competitive ratio analysis. 

I) Comparison of online algorithms: We compare the on¬ 
line algorithm on the profits they achieve. In order to compare 
the competitive ratio of the online algorithms, we normalize 
the profits of each algorithm by the best-performed algorithm 
under each setting as detailed below. 

First, we set the most profitable algorithm at each setting 
(under various workload utilizations and least service quality 
L values) as an optimal performance OPT'. Then we compute 
the lower bound of competitive ratio using OPT'/ALG where 
ALG is the net profit gained by an online algorithm. As 
OPT' is usually lower than the true optimum, therefore, the 
competitive ratio derived is only a lower bound of the real 
competitive ratios. It is fair enough to show that our designed 
algorithm has better worst-case competitive ratios than First- 
Fit, Best-Fit, and GreenSlot. 

Figure and Figure show the lowered bound of com¬ 
petitive ratios of the algorithms under various workloads 
settings and with least service quality L as 0.2 and 0.05 
respectively. From these figures, we observe that Best-Fit tends 
to gain a better profit when the data center utilization is 
lower than 60%, while First-Fit is better when the data center 
utilization is higher than about 80%. In whatever data center 




Fig. 2. Lower bounds of competitive ratio under different workloads 
with L = 0.2 




Fig. 3. Lower bound of competitive ratio under different workloads 
with L = 0.05 

utilization, our proposed algorithms always guarantee a better 
worst-case performance. Note that an online algorithm cannot 
predict precisely a data center’s long-time utilization at both 
fine-grained and coarse-grained levels. Therefore, alternatively 
employing the two algorithms First-Fit and Best-Fit cannot 
achieve a better worst-case performance than Random-Fit. 

Best-Fit is less profitable when the data utilization is high 
because Best-Fit tends to delay scheduling jobs in order 
to consume less expensive energy. This delayed scheduling 
behavior results in many jobs missing their deadlines and thus 
achieving a lower profit. While First-Fit always schedules jobs 
to the first available time slots thus it could schedule more 
jobs than other algorithms. In the simulation, we observe 
First-Fit schedules 20% more workloads then Best-Fit and 
GreenSlot, and around 10% more workloads than Random-Fit 
with moderate workload (has utilization 70% — 100%). But it 
cannot make a good use of green energy when the data center 
is of low utilization. Its green energy utilization is less than 
























70% of that of Best-Fit when workload utilization is around 
10% — 60%. While Random-Fit can strike a balance between 
the amount of workload scheduled and the amount of green 
energy consumed, and thus tends to have better competitive 
ratio. 

Taking the above analysis one step further, we conclude that 
if the data center utilization is predictable, then an adaptive 
scheduling algorithm which dynamically switches between 
Best-Fit and First-Fit according to the data center’s utilization 
in a long-enough scheduling window would have a better 
performance. However, the data center utilization is usually 
hard to be predicted ll32]| . 

In the simulation, we also find GreenSlot is sensitive to 
the value of the least service quality L. It has performance 
very close to Best-Fit when L is relatively small, i.e., the job 
span is relatively large. It is because the penalty of delaying 
scheduling jobs will not be effective when the jobs have 
relatively small least service quality, as the penalty will be 
imposed only when a job is about to miss its deadline (for 
example, 20% of its required processing time ahead of its 
deadline). 

The running time of these algorithms in scheduling a job 
is in the order of several milliseconds which is negligible 
compare to the job’s processing time, usually at several 
minutes or hours. In specific, First-Fit runs fastest, Random- 
Fit is the second, while GreenSlot and Best-Fit almost have 
the same running time. 

Based on our simulation results, we remark that Random-Fit 
is the best algorithm (in terms of competitive ratio and profit 
maximization). 

2) Comparison with offline algorithm.: We further conduct 
simulations to confirm with the theoretic result that Random- 
Fit has a better worst-case competitive ratio when jobs are of 
the same lengths and sizes. We implement an optimal offline 
algorithm to show the real experimental competitive ratios. 
The offline algorithm is formulated using a binary integer 
program and it is run by the LINDO solver. Note that the 
optimal algorithm is very time consuming, thus we shrink the 
nodes in the data center from 100 to 16 in order to get the 
optimal result within a reasonable time. 

In the simulation, we simulate 2 uniform workloads with 
utilization 10% and 100% respectively. We compare the on¬ 
line algorithms against the optimal offline algorithm using 
competitive ratio. For ease of presentation, we abbreviate 
the algorithms First-Fit, Best-Fit, Random-Fit, GreenSlot and 
offline optimal as: FF, BF, RF, GS and OPT respectively. 

TABLE III 

COMPETITIVE RATIO OF ONLINE ALGORITHMS 


matrix 

FF 

BF 

RF 

GS 

competitive ratio (workload = 10%) 

1.56 

1.03 

1.16 

1.03 

competitive ratio (workload = 100%) 

1.05 

1.29 

1.24 

1.27 


Table III shows the competitive ratio of various online 
algorithm under different workload utilization. We conclude 
that First-Fit, Best-Fit and GreenSlot have competitive ratios 


worse than the theoretical upper bound (1.25) of Random-Fit. 
This conclusion confirms our theoretical results. 

IV. Conclusions 

In this work, we study online scheduling of energy and jobs 
in green data centers with the objective of maximizing net 
profit. In our problem setting, energy costs are time-sensitive 
and so is the net profit. Prior work employs deterministic 
approaches only and the underlying algorithmic ideas are 
either First-Fit or Best-Fit; furthermore no theoretical analysis 
has been given. In this paper, competitive analysis is used 
to measure an online algorithm’s theoretical performance. 
We conclude that randomness plays an important role in 
maximizing net profit. Experiments on real workload traces 
have shown that our algorithm indeed outperforms the previous 
ones, as what the theory indicates. 
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Appendix 

A. Offline Algorithm 

We formulate a linear program for the special cases when 
jobs have same processing times and node requirements. Let 
g{t) denotes the amount of green energy arrive at time t 
and let b{t) denotes the unit brown energy price at time t. 
Assume all jobs have the same processing time slots p and 
node requirement q. Let vj denote the revenue earned by 
scheduling job j. Let yj be an indicator variable indicates 
whether a job is scheduled (yj = 1) or not (yj = 0). Let 
be an indicator variable denotes whether job j is started 
at time t (s[j, t] = 1) or not (s[j, f] = 0). Let n{t) denotes the 
number of jobs started at time t. Let e{t) denotes the energy 
demand at time t. 

We have the following formulation. 
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B. Hardness of the Problem GDC-RM 

Note that GDC-RM essentially is not an offline problem 
since the jobs and the green energy cannot be modeled and 
predicted precisely at all the time. However, understanding 
the hardness of the offline version may be useful to us in 
evaluating an online algorithm’s theoretical and empirical 
performance. We prove that the offline version of GDC-RM 
is NP-hard, using a reduction from the well-known NP-hard 
problem ‘Knapsack’ ED. 

Theorem 5. The offline version of the problem GDC-RM is 
NP-hard. 

Proof: Given a candidate solution, it takes polynomial¬ 
time for us to verify whether this solution is feasibly scheduled 
or not. Thus, the problem GDC-RM belongs to NP. In the 
following, we prove that GDC-RM is NP-hard by showing a 
polynomial-time reduction from the Knapsack problem to it. 
In the Knapsack problem, there are a knapsack of capacity W 
and n items with each one has size Si. The goal is to make the 
knapsack as full as possible. The Knapsack problem is known 
NP-hard ED- 

Consider the problem GDC-RM. Assume the produced 
green energy has a budget of i? in a scheduling window and 
the brown energy’s costs (B‘^ and B^) are high enough such 
that any use of brown energy makes no positive net profit at 
all. Therefore, to maximize the net profit, we would like to 
hnd a set of jobs such that these jobs consume as much as 
close to but no more than the green energy budget B without 
using any amount of the brown energy. Particularly, we restrict 


that the green energy is available within a scheduling window 
[i, t'] and all jobs j have the same release time t and (maybe 
different) deadlines dj (:= t + ^ < t') to ensure the same 
service qualities L — t and t' are the boundaries of this 
scheduling window and t' is the latest time where green energy 
is still available. Let t' — t = W. Also, we restrict that each 
job j has Qj = 1. This conversion takes linear time of the 
number of jobs. 

If we have a polynomial-time optimal solution to the 
problem GDC-RM with the special input instance as created 
as in the above, then we have an optimal solution to the 
following Knapsack problem: The knapsack has its capacity 
of VL = t' — t and each item j has its size of pj. As the 
Knapsack problem is NP-hard, then the problem GDC-RM is 
NP-hard. ■ 



